Goto

Collaborating Authors

 table representation


Texts or Images? A Fine-grained Analysis on the Effectiveness of Input Representations and Models for Table Question Answering

Zhou, Wei, Mesgar, Mohsen, Adel, Heike, Friedrich, Annemarie

arXiv.org Artificial Intelligence

In table question answering (TQA), tables are encoded as either texts or images. Prior work suggests that passing images of tables to multi-modal large language models (MLLMs) performs comparably to or even better than using textual input with large language models (LLMs). However, the lack of controlled setups limits fine-grained distinctions between these approaches. In this paper, we conduct the first controlled study on the effectiveness of several combinations of table representations and models from two perspectives: question complexity and table size. We build a new benchmark based on existing TQA datasets. In a systematic analysis of seven pairs of MLLMs and LLMs, we find that the best combination of table representation and model varies across setups. We propose FRES, a method selecting table representations dynamically, and observe a 10% average performance improvement compared to using both representations indiscriminately.


HIPPO: Enhancing the Table Understanding Capability of Large Language Models through Hybrid-Modal Preference Optimization

Liu, Zhenghao, Wang, Haolan, Li, Xinze, Xiong, Qiushi, Yang, Xiaocui, Gu, Yu, Yan, Yukun, Shi, Qi, Li, Fangfang, Yu, Ge, Sun, Maosong

arXiv.org Artificial Intelligence

Tabular data contains rich structural semantics and plays a crucial role in organizing and manipulating information. To better capture these structural semantics, this paper introduces the HybrId-modal Preference oPtimizatiOn (HIPPO) model, which represents tables using both text and image, and optimizes MLLMs to effectively learn more comprehensive table information from these multiple modalities. Specifically, HIPPO samples model responses from hybrid-modal table representations and designs a modality-consistent sampling strategy to enhance response diversity and mitigate modality bias during DPO training. Experimental results on table question answering and table fact verification tasks demonstrate the effectiveness of HIPPO, achieving a 4% improvement over various table reasoning models. Further analysis reveals that HIPPO not only enhances reasoning abilities based on unimodal table representations but also facilitates the extraction of crucial and distinct semantics from different modal representations. All data and codes are available at https://github.com/NEUIR/HIPPO.


Evaluation of Table Representations to Answer Questions from Tables in Documents : A Case Study using 3GPP Specifications

Roychowdhury, Sujoy, Soman, Sumit, Ranjani, HG, Sharma, Avantika, Gunda, Neeraj, Bala, Sai Krishna

arXiv.org Artificial Intelligence

With the ubiquitous use of document corpora for question answering, one important aspect which is especially relevant for technical documents is the ability to extract information from tables which are interspersed with text. The major challenge in this is that unlike free-flow text or isolated set of tables, the representation of a table in terms of what is a relevant chunk is not obvious. We conduct a series of experiments examining various representations of tabular data interspersed with text to understand the relative benefits of different representations. We choose a corpus of $3^{rd}$ Generation Partnership Project (3GPP) documents since they are heavily interspersed with tables. We create expert curated dataset of question answers to evaluate our approach. We conclude that row level representations with corresponding table header information being included in every cell improves the performance of the retrieval, thus leveraging the structural information present in the tabular data.

  Country: Asia > India > Karnataka > Bengaluru (0.04)
  Genre: Research Report (1.00)

Tabular Data Augmentation for Machine Learning: Progress and Prospects of Embracing Generative AI

Cui, Lingxi, Li, Huan, Chen, Ke, Shou, Lidan, Chen, Gang

arXiv.org Artificial Intelligence

Machine learning (ML) on tabular data is ubiquitous, yet obtaining abundant high-quality tabular data for model training remains a significant obstacle. Numerous works have focused on tabular data augmentation (TDA) to enhance the original table with additional data, thereby improving downstream ML tasks. Recently, there has been a growing interest in leveraging the capabilities of generative AI for TDA. Therefore, we believe it is time to provide a comprehensive review of the progress and future prospects of TDA, with a particular emphasis on the trending generative AI. Specifically, we present an architectural view of the TDA pipeline, comprising three main procedures: pre-augmentation, augmentation, and post-augmentation. Pre-augmentation encompasses preparation tasks that facilitate subsequent TDA, including error handling, table annotation, table simplification, table representation, table indexing, table navigation, schema matching, and entity matching. Augmentation systematically analyzes current TDA methods, categorized into retrieval-based methods, which retrieve external data, and generation-based methods, which generate synthetic data. We further subdivide these methods based on the granularity of the augmentation process at the row, column, cell, and table levels. Post-augmentation focuses on the datasets, evaluation and optimization aspects of TDA. We also summarize current trends and future directions for TDA, highlighting promising opportunities in the era of generative AI. In addition, the accompanying papers and related resources are continuously updated and maintained in the GitHub repository at https://github.com/SuDIS-ZJU/awesome-tabular-data-augmentation to reflect ongoing advancements in the field.


RTF: Region-based Table Filling Method for Relational Triple Extraction

An, Ning, Hei, Lei, Jiang, Yong, Meng, Weiping, Hu, Jingjing, Huang, Boran, Ren, Feiliang

arXiv.org Artificial Intelligence

Relational triple extraction is crucial work for the automatic construction of knowledge graphs. Existing methods only construct shallow representations from a token or token pair-level. However, previous works ignore local spatial dependencies of relational triples, resulting in a weakness of entity pair boundary detection. To tackle this problem, we propose a novel Region-based Table Filling method (RTF). We devise a novel region-based tagging scheme and bi-directional decoding strategy, which regard each relational triple as a region on the relation-specific table, and identifies triples by determining two endpoints of each region. We also introduce convolution to construct region-level table representations from a spatial perspective which makes triples easier to be captured. In addition, we share partial tagging scores among different relations to improve learning efficiency of relation classifier. Experimental results show that our method achieves state-of-the-art with better generalization capability on three variants of two widely used benchmark datasets.


Tables as Texts or Images: Evaluating the Table Reasoning Ability of LLMs and MLLMs

Deng, Naihao, Sun, Zhenjie, He, Ruiqi, Sikka, Aman, Chen, Yulong, Ma, Lin, Zhang, Yue, Mihalcea, Rada

arXiv.org Artificial Intelligence

Specifically, we investigate Recent years have witnessed an explosion of Large several research questions, including the effectiveness Language Models (LLMs), with impressive performance of image-based representation of tabular on various Natural Language Processing data and how different text-based or imagebased (NLP) tasks (Brown et al., 2020; Touvron et al., prompt methods affect LLMs' performance 2023; Team et al., 2023). Research to date has on table-related tasks. In addition, we provide analysis examined the performance of LLMs for various and hypothesis of LLMs' behaviors. Our findings aspects and abilities (Bang et al., 2023b; Bubeck include: et al., 2023; Akter et al., 2023), but their effectiveness on structured data such as tables is less explored. LLMs maintain decent performance when we Unlike unstructured text, tables are systematically use image-based table representations. Sometimes, organized structures of a large amount of image-based table representations can information. This characteristic makes tabular make LLMs perform better.


Polynomial-based Self-Attention for Table Representation learning

Kim, Jayoung, Shin, Yehjin, Choi, Jeongwhan, Wi, Hyowon, Park, Noseong

arXiv.org Artificial Intelligence

Structured data, which constitutes a significant portion of existing data types, has been a long-standing research topic in the field of machine learning. Various representation learning methods for tabular data have been proposed, ranging from encoder-decoder structures to Transformers. Among these, Transformer-based methods have achieved state-of-the-art performance not only in tabular data but also in various other fields, including computer vision and natural language processing. However, recent studies have revealed that self-attention, a key component of Transformers, can lead to an oversmoothing issue. We show that Transformers for tabular data also face this problem, and to address the problem, we propose a novel matrix polynomial-based self-attention layer as a substitute for the original self-attention layer, which enhances model scalability. In our experiments with three representative table learning models equipped with our proposed layer, we illustrate that the layer effectively mitigates the oversmoothing problem and enhances the representation performance of the existing methods, outperforming the state-of-the-art table representation methods. However, recent studies have raised concerns about the potential limitations of self-attention, a fundamental component of Transformers, specifically an issue of oversmoothing (Dong et al., 2021; Wang et al., 2022; Guo et al., 2023; Xue et al., 2023). Gong et al. (2021); Zhou et al. (2021) has highlighted that at deeper layers of the Transformer architecture, all token representations tend to become nearly identical (Brunner et al., 2019). The problem poses challenges when it comes to expanding the scale of training Transformers, especially in terms of depth, since Transformers rely on a simple weighted average aggregation method for value vectors. In our preliminary experiments, we observe that Transformers designed for tabular data also exhibit the oversmoothing issue, as illustrated in Fig.1.


Enhancing Open-Domain Table Question Answering via Syntax- and Structure-aware Dense Retrieval

Jin, Nengzheng, Li, Dongfang, Chen, Junying, Siebert, Joanna, Chen, Qingcai

arXiv.org Artificial Intelligence

Open-domain table question answering aims to provide answers to a question by retrieving and extracting information from a large collection of tables. Existing studies of open-domain table QA either directly adopt text retrieval methods or consider the table structure only in the encoding layer for table retrieval, which may cause syntactical and structural information loss during table scoring. To address this issue, we propose a syntax- and structure-aware retrieval method for the open-domain table QA task. It provides syntactical representations for the question and uses the structural header and value representations for the tables to avoid the loss of fine-grained syntactical and structural information. Then, a syntactical-to-structural aggregator is used to obtain the matching score between the question and a candidate table by mimicking the human retrieval process. Experimental results show that our method achieves the state-of-the-art on the NQ-tables dataset and overwhelms strong baselines on a newly curated open-domain Text-to-SQL dataset.


RoTaR: Efficient Row-Based Table Representation Learning via Teacher-Student Training

Chen, Zui, Cao, Lei, Madden, Sam

arXiv.org Artificial Intelligence

We propose RoTaR, a row-based table representation learning method, to address the efficiency and scalability issues faced by existing table representation learning methods. The key idea of RoTaR is to generate query-agnostic row representations that could be re-used via query-specific aggregation. In addition to the row-based architecture, we introduce several techniques: cell-aware position embedding, teacher-student training paradigm, and selective backward to improve the performance of RoTaR model.


Scalable and Interpretable Data Representation for High-Dimensional, Complex Data

Kim, Been (Massachusetts Institute of Technology) | Patel, Kayur (Google) | Rostamizadeh, Afshin (Google) | Shah, Julie (Massachusetts Institute of Technology)

AAAI Conferences

The majority of machine learning research has been focused on building models and inference techniques with sound mathematical properties and cutting edge performance. Little attention has been devoted to the development of data representation that can be used to improve a user's ability to interpret the data and machine learning models to solve real-world problems. In this paper, we quantitatively and qualitatively evaluate an efficient, accurate and scalable feature-compression method using latent Dirichlet allocation for discrete data. This representation can effectively communicate the characteristics of high-dimensional, complex data points. We show that the improvement of a user's interpretability through the use of a topic modeling-based compression technique is statistically significant, according to a number of metrics, when compared with other representations. Also, we find that this representation is scalable --- it maintains alignment with human classification accuracy as an increasing number of data points are shown. In addition, the learned topic layer can semantically deliver meaningful information to users that could potentially aid human reasoning about data characteristics in connection with compressed topic space.